{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* [2020菜菜的sklearn完整版：实现一棵树](https://www.bilibili.com/video/BV1MA411J7wm?p=5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import graphviz\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from sklearn import tree  # 决策树模块\n",
    "from sklearn.datasets import load_wine     # 红酒数据\n",
    "from sklearn.model_selection import train_test_split # 导入类：把数据集分割为训练集train与测试集test的模块"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "wine = load_wine()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* data 特征的数据\n",
    "* target 标签的数据\n",
    "* feature_names 特征的名字\n",
    "* target_names  标签的名字"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>10</th>\n",
       "      <th>11</th>\n",
       "      <th>12</th>\n",
       "      <th>0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>14.23</td>\n",
       "      <td>1.71</td>\n",
       "      <td>2.43</td>\n",
       "      <td>15.6</td>\n",
       "      <td>127.0</td>\n",
       "      <td>2.8</td>\n",
       "      <td>3.06</td>\n",
       "      <td>0.28</td>\n",
       "      <td>2.29</td>\n",
       "      <td>5.64</td>\n",
       "      <td>1.04</td>\n",
       "      <td>3.92</td>\n",
       "      <td>1065.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      0     1     2     3      4    5     6     7     8     9     10    11  \\\n",
       "0  14.23  1.71  2.43  15.6  127.0  2.8  3.06  0.28  2.29  5.64  1.04  3.92   \n",
       "\n",
       "       12  0   \n",
       "0  1065.0   0  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wine.data.shape # 特征数据的形状\n",
    "wine.target.shape # 标签数据的形状\n",
    "\n",
    "# 按index关联数据和标签\n",
    "pd.concat([pd.DataFrame(wine.data),pd.DataFrame(wine.target)],axis=1).head(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 分割数据集：训练集、测试集\n",
    "### 建立模型基本流程\n",
    "\n",
    "* sklearn建模三步曲\n",
    "* 决策树算法核心要解决的两个问题：\n",
    "\n",
    "    1. 如何从数据表中找出最佳节点和最佳分枝？\n",
    "    2. 如何让决策树停止生长，防止过拟合？\n",
    "    \n",
    "    \n",
    "* tree决策树模型\n",
    "    1. tree.DecisionTreeClassifier，分类树\n",
    "    2. tree.DecisionTreeRegressor，回归树\n",
    "    3. tree.export_graphviz，将生成的决策树导出为DOT格式，画图专用"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 分割数据集\n",
    "```python\n",
    "X_train,X_test,y_train,y_test = train_test_split( wine.data,wine.target,test_size = 0.3) # 测试集占总数据30%\n",
    "```\n",
    "\n",
    "### 建模三步曲\n",
    "```python\n",
    "clf = tree.DecisionTreeClassifier(criterion='gini') # 实例化，建立模型评估对象。调出模型\n",
    "clf = clf.fit(X_train,y_train)      # 通过模型的接口训练模型。fit是训练的接口，将训练集放到模型评估接口中去训练；\n",
    "score = clf.score(X_test,y_test)    # 通过模型的接口提取所需的信息，模型的分数，反馈预测准确度accuracy\n",
    "score\n",
    "```\n",
    "### 实现一颗树\n",
    "```python\n",
    "feature_name = ['酒精','苹果酸','灰','灰的碱性','镁','总酚','类黄酮','非黄烷类酚类','花青素','颜色强度','色调','od280/od315稀释葡萄酒','脯氨酸'] # 定义特征名称\n",
    "\n",
    "dot_data = tree.export_graphviz(clf   # 输入已经训练好的模型\n",
    "                                ,feature_names=feature_name              # 特征的名字\n",
    "                                ,class_names=['琴酒','雪莉','贝尔摩德']    # 标签名\n",
    "                                ,filled=True                              #\n",
    "                                ,rounded=True                            #\n",
    "                                )\n",
    "\n",
    "graph = graphviz.Source(dot_data) # 导出树\n",
    "graph\n",
    "```\n",
    "### 实现特征选择\n",
    "* 每个特征重要程度的高低\n",
    "```python\n",
    "[*zip(feature_name,clf.feature_importances_)]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 重要参数\n",
    "\n",
    "```python\n",
    "tree.DecisionTreeClassifier(criterion='gini'       #确定不纯度的计算方法\n",
    "                            ,splitter='random'     # 控制决策树随机性，防止过拟合的一种方式\n",
    "                            ,random_state=0        # 控制决策树的随机性，防止过拟合的一种方式\n",
    "                            ,max_depth=3           # 树的最大深度\n",
    "                            ,min_samples_split=5   # 节点必须包含至少5个sample才会分枝\n",
    "                            ,min_samples_leaf=5    # 分枝后的每个节点，至少包含5个sample，才会发生分枝，否则不分枝\n",
    "                            ,min_weight_fraction_leaf=0.0\n",
    "                            ,max_features=None     # 限制可用特征数，暴力降维\n",
    "                            ,min_impurity_decrease=0.0   # 限制信息增益的大小，（信息增益：父节点与子节点之间的信息熵的差）\n",
    "                            ,max_leaf_nodes=None                            \n",
    "                            ,min_impurity_split=None\n",
    "                            ,class_weight=None\n",
    "                            ,presort='deprecated'\n",
    "                            ,ccp_alpha=0.0)\n",
    "```\n",
    "\n",
    "1. 确定不纯度的计算方法：`criterion`\n",
    "    - 追求不纯度的最优化\n",
    "    - 子节点不纯度，一定低于 父节点的不纯度\n",
    "    - 通过不纯度衡量决策树最佳节点和最佳分枝，\n",
    "    - 不纯度越低，决策树对训练的拟合越好。\n",
    "    \n",
    "2. 控制决策树随机性：random_state & splitter，防止过拟合的一种方式\n",
    "    - `random_state`，输入整数，生成随机数的个数\n",
    "    - `splitter = \"random\"`，分枝时会更随机，树会更深，对训练集的拟合程度会降低。——这也是防止过拟合的一种方式\n",
    "    \n",
    "    \n",
    "\n",
    "3. 剪枝参数——决策数的核心问题，对决策树影响非常巨大\n",
    "    * `max_depth`，\n",
    "        - 限制树的最大深度，超过限制深度的树枝全部剪掉，\n",
    "        - 最广泛的剪枝参数，**高纬度、低样本量非常有效**。\n",
    "        - 从=3开始尝试\n",
    "    * `min_samples_leaf`&`min_samples_split`\n",
    "        - 限制叶子节点的，重要参数\n",
    "        - 两者搭配使用，让模型变得平滑\n",
    "        - 设置的太小——造成过拟合\n",
    "        - 设置的太大——阻止模型学习数据\n",
    "        - 建议从=5开始使用\n",
    "        - 若叶节点中包含样本量很大，输入浮点数，作为百分比来使用。\n",
    "        - `min_samples_split`：**节点**必须包含至少指定数量的sample（样本）才会分枝。\n",
    "        - `min_samples_leaf`：**节点后**的每个节点，必须包含指定数量的sample，该分枝才会发生。\n",
    "    * `max_features`&`min_impurity_decrease`\n",
    "        - `max_features`，限制可用特征数；暴力降维；\n",
    "            - 如果希望通过降维防止过拟合，建议使用PCA，ICA或特征选择模块中的降维算法。\n",
    "        - `min_impurity_decrease` ，限制信息增益的大小；\n",
    "            - 信息增益越大，代表这一层的分枝对于决策树的贡献越大。\n",
    "            - 即限制，当信息增益小于某个值时，就不要再算了。  \n",
    "    \n",
    "    * **确定最优剪枝参数**——使用超参数的曲线进行判断\n",
    "        - 超参数的学习曲线\n",
    "            * 以超参数的取值（clf.score(x_test,y_test)）为横坐标，\n",
    "            * 模型的度量指数为纵坐标的曲线\n",
    "        - `python`\n",
    "            ```python\n",
    "            import matplotlib.pyplot as plt\n",
    "            test = []\n",
    "            for i in range(10):\n",
    "                clf = tree.DecisionTreeClassifier(max_depth=i+1\n",
    "                                                  ,criterion=\"entropy\"\n",
    "                                                  ,random_state=30\n",
    "                                                  ,splitter=\"random\"\n",
    "                                                 )\n",
    "                clf = clf.fit(Xtrain, Ytrain)\n",
    "                score = clf.score(Xtest, Ytest)\n",
    "                test.append(score)\n",
    "            plt.plot(range(1,11),test,color=\"red\",label=\"max_depth\")\n",
    "            plt.legend()\n",
    "            plt.show()\n",
    "            ```\n",
    "            \n",
    "4. 目标权重参数\n",
    "    * `class_weight`&`min_weight_fraction_leaf`，用的非常少，使用时比较难\n",
    "        - `class_weight`\n",
    "            * 默认=None，自动给与数据集中的所有**标签**相同的权重。\n",
    "        - `min_weight_fraction_leaf`，有了权重之后，样本量就不再是单纯地记录数目，而是受输入的权重影响了，因此这时候剪枝，就需要搭配min_ weight_fraction_leaf这个基于权重的剪枝参数来使用。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "训练集大小(124, 13),测试集大小(54, 13)\n"
     ]
    }
   ],
   "source": [
    "# 分割数据集\n",
    "X_train,X_test,y_train,y_test = train_test_split( wine.data,wine.target,test_size = 0.3) # 测试集占总数据30%\n",
    "print('训练集大小{0},测试集大小{1}'.format(X_train.shape,X_test.shape))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD4CAYAAADiry33AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAcIklEQVR4nO3deZRU9Zn/8fdDI7KqCK2jtNq4sSiL2uKC4xJEQX6CqFGZoIRxjmI0P3ScJOqMMRonJzPx5xiXuJyIRAIho9VEcIOIDKgxYre0AgKxB1FbUBpQUASh4fn9cauxaBu6oKv6W3Xr8zqnT3XdpeqpOvrh9lP3PmXujoiIxFer0AWIiEh2KehFRGJOQS8iEnMKehGRmFPQi4jEXOvQBTSma9euXlpaGroMEZG8UVlZucbdixtbl5NBX1paSkVFRegyRETyhpl9sKt1at2IiMScgl5EJOYU9CIiMZeTPXoRyT9bt26lpqaGzZs3hy4l1tq2bUtJSQn77LNP2vso6EUkI2pqaujUqROlpaWYWehyYsndWbt2LTU1NXTv3j3t/dS6EZGM2Lx5M126dFHIZ5GZ0aVLlz3+q0lBLyIZo5DPvr15j9W6iasNG+CBB+Drr0NXAt26wT/+I+xBT1FEMkdBH1cPPwz/9m+QC0dY7vD44/Dkk9CzZ+hqRAqOWjdx5A6TJsHpp8P27eF/nnoKli+HE06A+++PlonESGlpKWvWrNmrfSdOnMjKlSsz8li7oqCPo6oqWLwYrrwydCWRSy+FRYvg3HNh/HgYPBg+/DB0VSI5oWHQZ4NaN3E0aRK0aQOXXRa6km/83d/B9OkwYQLceCP06RN9hnDllbnRXpLMuvHG6IAjk/r3h/vu2+0mK1asYMiQIZxxxhn89a9/pV+/fowdO5Y77riD1atXM3ny5GR5N7Jp0ybatWvHE088QY8ePbj33ntZtGgREyZMYOHChYwaNYr58+fTvn37bz3P2rVrGTVqFLW1tQwYMIDUr2T9/e9/z/3338+WLVs45ZRT+M1vfkNRUREdO3bk2muvZc6cOXTu3JmpU6cyd+5cKioq+N73vke7du14/fXXAXjggQeYMWMGW7du5amnnqJnM1ueOqKPm7o6mDIFhg2DAw8MXc3OzODqq+Htt6FfPxgzBi65BGprQ1cmMVJdXc348eN55513WLp0KVOmTOHVV1/lnnvu4Re/+AU9e/Zk3rx5LFiwgLvuuovbbrsNiMK/urqaadOmMXbsWB599NFGQx7gzjvv5IwzzmDBggUMHz6cD5N/oS5ZsoQ//vGPvPbaa1RVVVFUVLTjH5eNGzdy4okn8tZbb3HWWWdx5513cumll1JWVsbkyZOpqqqiXbt2AHTt2pW33nqL6667jnvuuafZ74mO6OPmpZfg00/hqqtCV7JrRx4Jc+bAf/0X/Ou/wvHHw2OPwYgRoSuTTGniyDubunfvTp8+fQA47rjjGDRoEGZGnz59WLFiBevXr2fMmDG89957mBlbt24FoFWrVkycOJG+ffty7bXXMnDgwF0+x7x58ygvLwdg2LBhdO7cGYDZs2dTWVnJySefDMCmTZs46KCDdjz+5ZdfDsDo0aO5+OKLd/n49etOOumkHc/THDqij5tJk6Ij+QsuCF3J7hUVwb/8C1RWwqGHwkUXRadgbtgQujLJc/vuu++O31u1arXjfqtWrairq+P222/nnHPOYdGiRcyYMWOni4/ee+89OnbsmFbPvLHz2d2dMWPGUFVVRVVVFcuWLeNnP/tZ2vs3fA1FRUXU1dU1WUtTFPRx8sUXMG0aXH551KPPB8cfD2+8ER3Z/+530Lcv/M//hK5KYmz9+vV069YNiD4ITV0+fvx45s2bx9q1a3n66ad3+RhnnnnmjpbMCy+8wGeffQbAoEGDePrpp1m9ejUA69at44MPojHx27dv3/GYU6ZM4YwzzgCgU6dOfPHFF5l9kQ0o6OOkvBw2bcqds23S1aYN3H03vPZa9Ps558A//3P0WkQy7Mc//jG33norAwcOZNu2bTuW33TTTfzgBz/g2GOP5fHHH+eWW27ZEdgN3XHHHcybN48TTzyRWbNmcfjhhwPQu3dv7r77bs477zz69u3L4MGDWbVqFQAdOnRg8eLFnHTSSbz88sv89Kc/BeD73/8+48aNo3///mzK0n/zlvppca4oKytzfcPUXjj3XPjgA/jb3/L3TJaNG+EnP4GHHoJevaJW1Eknha5K0rBkyRJ69eoVuoyc1bFjR7788suMPFZj77WZVbp7WWPb64g+Lmpq4OWXYfTo/A15gA4d4MEHYebMqF9/6qlw112Q/MBMRPacgj4upkyJrogdPTp0JZlx3nmwcGH0ecMdd8DAgbBsWeiqpMA88cQT9O/ff6ef66+/fq8eK1NH83tDrZs4cI8uQNp//6jPHTdPPQXjxsFXX8F//AfccAO00jFKrlmyZAk9e/bUBMssc3eWLl2q1k3BybWRB5n23e9GIxQGDdIIhRzWtm1b1q5dSy4ePMZF/RePtG3bdo/20wVTcZCLIw8y7ZBDYMaMaArmTTdFf8E8+GD+fyYRIyUlJdTU1FCrK52zqv6rBPeEWjf5rq4OSkqiSZUZuIIuLyxfHo1PePVVGDkSHn0UiotDVyUSlFo3cVY/8iCubZvGHHlkdFHVf/4nPPdcdNHV9OmhqxLJWWkFvZkNMbNlZlZtZrc0sr6zmU0zs3fMbL6ZHZ+yboWZLTSzKjPTYXqmTZoEnTvn/siDTCsqgh/9CCoqorbOiBHRwDSNUBD5liaD3syKgIeAoUBvYJSZ9W6w2W1Albv3Ba4Cft1g/Tnu3n9Xf1bIXkodeZAy36Og9OkD8+fDbbfBxInRCIW5c0NXJZJT0jmiHwBUu/tyd98CTAUajhnsDcwGcPelQKmZHZzRSuXb8nXkQaa1aQP//u9Rz36ffaIRCjffDCnDqkQKWTpB3w34KOV+TXJZqreBiwHMbABwBFD/sbADs8ys0syu2dWTmNk1ZlZhZhX61D5NkybBUUfBaaeFriQ3nHZadKrpddfBvfdGoxMqK0NXJRJcOkHf2LlrDU/V+SXQ2cyqgB8CC4D62ZoD3f1EotbP9WZ2ZmNP4u6PuXuZu5cV6wyKptWPPNA3NO2sQ4doTs6LL8Lnn0cjFH7+8+jsJJEClc559DXAYSn3S4CdhjW7+wZgLIBFl8W9n/zB3Vcmb1eb2TSiVtC8Zlde6OI28iDTzj8/usjqhhvgpz+FZ5+FO++EPbzQRKRFtWkTnSqdYekE/ZvAMWbWHfgYuAL4h9QNzOwA4KtkD/+fgHnuvsHMOgCt3P2L5O/nAXdl8gUUJPeobXP66VHrRhrXuTNMnhydkXPddTB0aOiKRHbv4IPhk08y/rBNBr2715nZDcBMoAiY4O6LzWxccv0jQC/gSTPbBrwLXF1fNjAtOfuiNTDF3V/M+KsoNG+/HR2tPvxw6Eryw2WXReMTFi4MXYnI7mXpC4N0ZWw+uvnm6PL/Vaty7wvARSQIXRkbJ3V1UX9+2DCFvIikRUGfb156KerhFfq58yKSNgV9vinUkQcistcU9PlEIw9EZC8o6POJRh6IyF5Q0OcTjTwQkb2goM8X9SMP9I1KIrKHFPT5QiMPRGQvKejzQf3Ig9NOg6OPDl2NiOQZBX0+qB95cNVVoSsRkTykoM8HkyZFMzAuuyx0JSKShxT0uU4jD0SkmRT0uW72bI08EJFmUdDnOo08EJFmUtDnMo08EJEMUNDnsvJy+OortW1EpFkU9LlMIw9EJAMU9LlKIw9EJEMU9LlKIw9EJEMU9LlIIw9EJIMU9LmofuSBPoQVkQxQ0OeiSZNgn3008kBEMiKtoDezIWa2zMyqzeyWRtZ3NrNpZvaOmc03s+PT3VcaSB150KVL6GpEJAaaDHozKwIeAoYCvYFRZta7wWa3AVXu3he4Cvj1HuwrqTTyQEQyLJ0j+gFAtbsvd/ctwFRgRINtegOzAdx9KVBqZgenua+kqh95MGxY6EpEJCbSCfpuwEcp92uSy1K9DVwMYGYDgCOAkjT3JbnfNWZWYWYVtbW16VUfN19+qZEHIpJx6QR9Y1freIP7vwQ6m1kV8ENgAVCX5r7RQvfH3L3M3cuKi4vTKCuGNPJARLKgdRrb1ACHpdwvAVambuDuG4CxAGZmwPvJn/ZN7SspNPJARLIgnSP6N4FjzKy7mbUBrgCmp25gZgck1wH8EzAvGf5N7itJH38cfRCrkQcikmFNHtG7e52Z3QDMBIqACe6+2MzGJdc/AvQCnjSzbcC7wNW72zc7LyXPTZ6skQcikhXm3mjLPKiysjKvqKgIXUbLcYe+faFTJ/jLX0JXIyJ5yMwq3b2ssXW6MjYXaOSBiGSRgj4XaOSBiGSRgj40jTwQkSxT0IemkQcikmUK+tA08kBEskxBH1L9yIPLLtPIAxHJGgV9SBp5ICItQEEf0qRJcOSRcPrpoSsRkRhT0IeikQci0kIU9KFMmRJdEau2jYhkmYI+BHd48sloSuXRR4euRkRiTkEfgkYeiEgLUtCHoJEHItKCFPQtTSMPRKSFKehbmkYeiEgLU9C3NI08EJEWpqBvSRp5ICIBKOhbkkYeiEgACvqWpJEHIhKAgr6laOSBiASioG8p9SMPRo8OXYmIFBgFfUuZNAlOPRWOOSZ0JSJSYBT0LeHtt2HhQn0IKyJBpBX0ZjbEzJaZWbWZ3dLI+v3NbIaZvW1mi81sbMq6FWa20MyqzKwik8XnjfqRB5dfHroSESlArZvawMyKgIeAwUAN8KaZTXf3d1M2ux54190vNLNiYJmZTXb3Lcn157j7mkwXnxfq6mDyZI08EJFg0jmiHwBUu/vyZHBPBUY02MaBTmZmQEdgHVCX0UrzlUYeiEhg6QR9N+CjlPs1yWWpHgR6ASuBhcB4d9+eXOfALDOrNLNrdvUkZnaNmVWYWUVtbW3aLyDnaeSBiASWTtA3dtK3N7h/PlAFHAr0Bx40s/2S6wa6+4nAUOB6MzuzsSdx98fcvczdy4qLi9OpPfdp5IGI5IB0gr4GOCzlfgnRkXuqsUC5R6qB94GeAO6+Mnm7GphG1AoqDBp5ICI5IJ2gfxM4xsy6m1kb4ApgeoNtPgQGAZjZwUAPYLmZdTCzTsnlHYDzgEWZKj7naeSBiOSAJs+6cfc6M7sBmAkUARPcfbGZjUuufwT4OTDRzBYStXp+4u5rzOxIYFr0GS2tgSnu/mKWXktuqR95cPvtGnkgIkE1GfQA7v488HyDZY+k/L6S6Gi94X7LgX7NrDE/aeSBiOQIXRmbLRp5ICI5QkGfDRp5ICI5REGfDRp5ICI5REGfadu2Rf35Cy7QyAMRyQkK+kybNw9WrdKHsCKSMxT0mZZIQLt2MHRo6EpERAAFfWZt3x5dDTt0KHToELoaERFAQZ9Zb7wRtW0uuSR0JSIiOyjoMymRiM620aRKEckhCvpMcY+CfvBg2H//0NWIiOygoM+UBQtgxQq1bUQk5yjoM6W8HIqKYPjw0JWIiOxEQZ8piQScdRZ07Rq6EhGRnSjoM+Hdd2HpUrVtRCQnKegzIZGIZs6PHBm6EhGRb1HQZ0J5OZx2GhxySOhKRES+RUHfXMuXQ1WV2jYikrMU9M2VSES3F18ctg4RkV1Q0DdXIgEnnQSlpaErERFplIK+OWpqovk2OpoXkRymoG+OadOiW/XnRSSHKeibI5GA446DHj1CVyIisktpBb2ZDTGzZWZWbWa3NLJ+fzObYWZvm9liMxub7r55a/VqeOUVHc2LSM5rMujNrAh4CBgK9AZGmVnvBptdD7zr7v2As4H/Z2Zt0tw3Pz3zTPRFI+rPi0iOS+eIfgBQ7e7L3X0LMBUY0WAbBzqZmQEdgXVAXZr75qdEAo46Cvr2DV2JiMhupRP03YCPUu7XJJelehDoBawEFgLj3X17mvsCYGbXmFmFmVXU1tamWX4gn30Gs2dHbRuz0NWIiOxWOkHfWJJ5g/vnA1XAoUB/4EEz2y/NfaOF7o+5e5m7lxUXF6dRVkDPPgt1dWrbiEheSCfoa4DDUu6XEB25pxoLlHukGngf6JnmvvknkYCSEjj55NCViIg0KZ2gfxM4xsy6m1kb4ApgeoNtPgQGAZjZwUAPYHma++aXL7+EmTOjo/lWOjtVRHJf66Y2cPc6M7sBmAkUARPcfbGZjUuufwT4OTDRzBYStWt+4u5rABrbNzsvpYU8/zxs3qzTKkUkb5h7oy3zoMrKyryioiJ0GY274gqYMwdWroy+OlBEJAeYWaW7lzW2Tr2HPbF5Mzz3HFx0kUJeRPKGgn5PzJoV9ejVthGRPKKg3xOJBBxwAJx9duhKRETSpqBP19atMH06DB8ObdqErkZEJG0K+nTNmQOff662jYjkHQV9uhIJ6NABzjsvdCUiIntEQZ+ObdvgT3+CYcOgbdvQ1YiI7BEFfTpeey2aP6+2jYjkIQV9OhKJ6Ej+ggtCVyIisscU9E3Zvh3Ky+H886Fjx9DViIjsMQV9UyoqoKZGI4lFJG8p6JuSSEDr1nDhhaErERHZKwr63XGPgn7QIOjcOXQ1IiJ7RUG/O++8A//7v2rbiEheU9DvTnl59OUiF10UuhIRkb2moN+dRAL+/u/hoINCVyIistcU9LuybBksXqyLpEQk7ynodyWRiG5Hjgxbh4hIMynod6W8HE45BUpKQlciItIsCvrGrFgBlZVq24hILCjoG1NeHt3qtEoRiQEFfWPKy6FfPzjqqNCViIg0W1pBb2ZDzGyZmVWb2S2NrP+RmVUlfxaZ2TYzOzC5boWZLUyuq8j0C8i4VavgL39R20ZEYqN1UxuYWRHwEDAYqAHeNLPp7v5u/Tbu/ivgV8ntLwRucvd1KQ9zjruvyWjl2TJtWjT6QEEvIjGRzhH9AKDa3Ze7+xZgKjBiN9uPAv6QieKCSCSgRw/o1St0JSIiGZFO0HcDPkq5X5Nc9i1m1h4YAiRSFjswy8wqzeyavS20RaxZA3PnRkfzZqGrERHJiCZbN0Bjiee72PZC4LUGbZuB7r7SzA4C/mxmS9193reeJPpH4BqAww8/PI2ysmD69Oj7YdW2EZEYSeeIvgY4LOV+CbByF9teQYO2jbuvTN6uBqYRtYK+xd0fc/cydy8rLi5Oo6wsSCSgtBROOCHM84uIZEE6Qf8mcIyZdTezNkRhPr3hRma2P3AW8EzKsg5m1qn+d+A8YFEmCs+49evhz3+Ozp1X20ZEYqTJ1o2715nZDcBMoAiY4O6LzWxccv0jyU1HArPcfWPK7gcD0ywKztbAFHd/MZMvIGOeew62blXbRkRix9x31W4Pp6yszCsqWviU+0sugddfj74ftpWuIxOR/GJmle5e1tg6JRrAxo3wwgvRpEqFvIjEjFIN4MUXYdMmtW1EJJYU9BDNtunSBc48M3QlIiIZp6D/+mt49tnoe2Fbp3NZgYhIflHQv/QSbNigkcQiElsK+vJy2G8/GDQodCUiIllR2EFfVwfPPAMXXgj77hu6GhGRrCjsoJ87F9auVdtGRGKtsIM+kYD27WHIkNCViIhkTeEG/fbt0ZeMDB0ahb2ISEwVbtC//jp88okukhKR2CvcoE8koE0bGDYsdCUiIllVmEHvHp1WOXhwdGqliEiMFWbQv/UWfPCB2jYiUhAKM+gTCSgqguHDQ1ciIpJ1hRf07lHQn312NMhMRCTmCi/oFy+Gv/1NbRsRKRiFF/Tl5dF3wl50UehKRERaROEFfSIBp58OhxwSuhIRkRZRWEFfXQ3vvKO2jYgUlMIK+vLy6FZDzESkgBRW0CcSUFYGRxwRuhIRkRZTOEH/0Ucwf76O5kWk4KQV9GY2xMyWmVm1md3SyPofmVlV8meRmW0zswPT2bfF1Ldt1J8XkQLTZNCbWRHwEDAU6A2MMrPeqdu4+6/cvb+79wduBea6+7p09m0x5eVw/PFw7LFBnl5EJJR0jugHANXuvtzdtwBTgRG72X4U8Ie93Dc7Pv0UXnlFR/MiUpDSCfpuwEcp92uSy77FzNoDQ4DEXux7jZlVmFlFbW1tGmXtgT/9KRp9oP68iBSgdILeGlnmu9j2QuA1d1+3p/u6+2PuXubuZcXFxWmUtQcSCTj6aOjTJ7OPKyKSB9IJ+hrgsJT7JcDKXWx7Bd+0bfZ03+xYtw7mzInaNtbYvzsiIvGWTtC/CRxjZt3NrA1RmE9vuJGZ7Q+cBTyzp/tm1YwZUFento2IFKzWTW3g7nVmdgMwEygCJrj7YjMbl1z/SHLTkcAsd9/Y1L6ZfhG7lUjAYYfBySe36NOKiOQKc99Vuz2csrIyr6ioaP4DffEFFBfDuHFw333NfzwRkRxlZpXuXtbYunhfGfv88/D11zqtUkQKWryDPpGAgw6KxhKLiBSo+Ab9pk3REf3IkdH3w4qIFKj4Bv2sWbBxo9o2IlLw4hv0iQR07hx9CbiISAGLZ9Bv2QLTp8Pw4bDPPqGrEREJKp5B//LLsH692jYiIsQ16MvLoWNHGDw4dCUiIsHFL+i3bYumVQ4bBm3bhq5GRCS4+AX9K69Aba3aNiIiSfEL+kQiOpIfOjR0JSIiOSFeQb99O0ybBkOGRD16ERGJWdDPnw8ff6yRxCIiKeIV9IlEdN78hReGrkREJGfEJ+jdo6AfNAgOOCB0NSIiOaPJLx7JG199Bd/5ThT0IiKyQ3yCvkMH+O1vQ1chIpJz4tO6ERGRRinoRURiTkEvIhJzCnoRkZhT0IuIxJyCXkQk5hT0IiIxp6AXEYk5c/fQNXyLmdUCH4Suo5m6AmtCF5Ej9F7sTO/HzvR+fKM578UR7l7c2IqcDPo4MLMKdy8LXUcu0HuxM70fO9P78Y1svRdq3YiIxJyCXkQk5hT02fNY6AJyiN6Lnen92Jnej29k5b1Qj15EJOZ0RC8iEnMKehGRmFPQZ5CZHWZmc8xsiZktNrPxoWsKzcyKzGyBmT0bupbQzOwAM3vazJYm/xs5LXRNIZnZTcn/TxaZ2R/MrG3omlqSmU0ws9Vmtihl2YFm9mczey952zkTz6Wgz6w64GZ37wWcClxvZr0D1xTaeGBJ6CJyxK+BF929J9CPAn5fzKwb8H+BMnc/HigCrghbVYubCAxpsOwWYLa7HwPMTt5vNgV9Brn7Knd/K/n7F0T/I3cLW1U4ZlYCDAMK/jsezWw/4EzgcQB33+LunwctKrzWQDszaw20B1YGrqdFufs8YF2DxSOA3yV//x1wUSaeS0GfJWZWCpwAvBG4lJDuA34MbA9cRy44EqgFnki2sn5rZh1CFxWKu38M3AN8CKwC1rv7rLBV5YSD3X0VRAeOwEGZeFAFfRaYWUcgAdzo7htC1xOCmf0fYLW7V4auJUe0Bk4EHnb3E4CNZOjP8nyU7D2PALoDhwIdzGx02KriS0GfYWa2D1HIT3b38tD1BDQQGG5mK4CpwHfM7PdhSwqqBqhx9/q/8J4mCv5CdS7wvrvXuvtWoBw4PXBNueBTMzsEIHm7OhMPqqDPIDMzoh7sEne/N3Q9Ibn7re5e4u6lRB+yvezuBXvE5u6fAB+ZWY/kokHAuwFLCu1D4FQza5/8/2YQBfzhdIrpwJjk72OAZzLxoK0z8SCyw0DgSmChmVUll93m7s+HK0lyyA+ByWbWBlgOjA1cTzDu/oaZPQ28RXS22gIKbBSCmf0BOBvoamY1wB3AL4H/NrOrif4x/G5GnksjEERE4k2tGxGRmFPQi4jEnIJeRCTmFPQiIjGnoBcRiTkFvYhIzCnoRURi7v8Dq+soWAgRF0kAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 最优剪枝参数\n",
    "import matplotlib.pyplot as plt\n",
    "test = []\n",
    "for i in range(10):\n",
    "    clf = tree.DecisionTreeClassifier(max_depth=i+1\n",
    "                                    ,criterion=\"entropy\"\n",
    "                                    ,random_state=30\n",
    "                                    ,splitter=\"random\"\n",
    "                                   )\n",
    "    clf = clf.fit(X_train, y_train)\n",
    "    score = clf.score(X_test, y_test)\n",
    "    test.append(score)\n",
    "plt.plot(range(1,11),test,color=\"red\",label=\"max_depth\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9629629629629629"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "clf = tree.DecisionTreeClassifier(criterion='gini'    # 确定不纯度的计算方法\n",
    "                                  ,random_state=0     # 控制决策树随机性，防止过拟合\n",
    "                                  ,splitter='random'  # 控制决策树随机性，防止过拟合\n",
    "                                  ,max_depth= 3       # 限制树的最大深度\n",
    "#                                   ,min_samples_leaf=10 \n",
    "#                                   ,min_samples_split=10\n",
    "#                                   ,min_impurity_decrease=0.001\n",
    "                                 )\n",
    "clf = clf.fit(X_train,y_train)      # 通过模型的接口训练模型。fit是训练的接口，将训练集放到模型评估接口中去训练；\n",
    "score = clf.score(X_test,y_test)    # 通过模型的接口提取所需的信息，模型的分数，反馈预测准确度accuracy\n",
    "score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\r\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\r\n",
       " -->\r\n",
       "<!-- Title: Tree Pages: 1 -->\r\n",
       "<svg width=\"1066pt\" height=\"433pt\"\r\n",
       " viewBox=\"0.00 0.00 1066.00 433.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 429)\">\r\n",
       "<title>Tree</title>\r\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-429 1062,-429 1062,4 -4,4\"/>\r\n",
       "<!-- 0 -->\r\n",
       "<g id=\"node1\" class=\"node\"><title>0</title>\r\n",
       "<path fill=\"#defbea\" stroke=\"black\" d=\"M584,-425C584,-425 471,-425 471,-425 465,-425 459,-419 459,-413 459,-413 459,-354 459,-354 459,-348 465,-342 471,-342 471,-342 584,-342 584,-342 590,-342 596,-348 596,-354 596,-354 596,-413 596,-413 596,-419 590,-425 584,-425\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"527.5\" y=\"-409.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">类黄酮 &lt;= 1.401</text>\r\n",
       "<text text-anchor=\"middle\" x=\"527.5\" y=\"-394.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">gini = 0.652</text>\r\n",
       "<text text-anchor=\"middle\" x=\"527.5\" y=\"-379.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">samples = 124</text>\r\n",
       "<text text-anchor=\"middle\" x=\"527.5\" y=\"-364.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">value = [39, 53, 32]</text>\r\n",
       "<text text-anchor=\"middle\" x=\"527.5\" y=\"-349.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">class = 雪莉</text>\r\n",
       "</g>\r\n",
       "<!-- 1 -->\r\n",
       "<g id=\"node2\" class=\"node\"><title>1</title>\r\n",
       "<path fill=\"#a672ed\" stroke=\"black\" d=\"M475,-306C475,-306 378,-306 378,-306 372,-306 366,-300 366,-294 366,-294 366,-235 366,-235 366,-229 372,-223 378,-223 378,-223 475,-223 475,-223 481,-223 487,-229 487,-235 487,-235 487,-294 487,-294 487,-300 481,-306 475,-306\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"426.5\" y=\"-290.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">色调 &lt;= 0.935</text>\r\n",
       "<text text-anchor=\"middle\" x=\"426.5\" y=\"-275.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">gini = 0.349</text>\r\n",
       "<text text-anchor=\"middle\" x=\"426.5\" y=\"-260.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">samples = 40</text>\r\n",
       "<text text-anchor=\"middle\" x=\"426.5\" y=\"-245.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">value = [0, 9, 31]</text>\r\n",
       "<text text-anchor=\"middle\" x=\"426.5\" y=\"-230.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">class = 贝尔摩德</text>\r\n",
       "</g>\r\n",
       "<!-- 0&#45;&gt;1 -->\r\n",
       "<g id=\"edge1\" class=\"edge\"><title>0&#45;&gt;1</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M492.459,-341.907C484.624,-332.832 476.241,-323.121 468.169,-313.769\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"470.663,-311.303 461.48,-306.021 465.365,-315.878 470.663,-311.303\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"459.65\" y=\"-327.254\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">True</text>\r\n",
       "</g>\r\n",
       "<!-- 8 -->\r\n",
       "<g id=\"node9\" class=\"node\"><title>8</title>\r\n",
       "<path fill=\"#e9fcf1\" stroke=\"black\" d=\"M683.5,-306C683.5,-306 575.5,-306 575.5,-306 569.5,-306 563.5,-300 563.5,-294 563.5,-294 563.5,-235 563.5,-235 563.5,-229 569.5,-223 575.5,-223 575.5,-223 683.5,-223 683.5,-223 689.5,-223 695.5,-229 695.5,-235 695.5,-235 695.5,-294 695.5,-294 695.5,-300 689.5,-306 683.5,-306\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"629.5\" y=\"-290.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">脯氨酸 &lt;= 947.397</text>\r\n",
       "<text text-anchor=\"middle\" x=\"629.5\" y=\"-275.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">gini = 0.51</text>\r\n",
       "<text text-anchor=\"middle\" x=\"629.5\" y=\"-260.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">samples = 84</text>\r\n",
       "<text text-anchor=\"middle\" x=\"629.5\" y=\"-245.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">value = [39, 44, 1]</text>\r\n",
       "<text text-anchor=\"middle\" x=\"629.5\" y=\"-230.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">class = 雪莉</text>\r\n",
       "</g>\r\n",
       "<!-- 0&#45;&gt;8 -->\r\n",
       "<g id=\"edge8\" class=\"edge\"><title>0&#45;&gt;8</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M562.888,-341.907C570.8,-332.832 579.266,-323.121 587.419,-313.769\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"590.241,-315.859 594.174,-306.021 584.965,-311.259 590.241,-315.859\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"595.882\" y=\"-327.262\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">False</text>\r\n",
       "</g>\r\n",
       "<!-- 2 -->\r\n",
       "<g id=\"node3\" class=\"node\"><title>2</title>\r\n",
       "<path fill=\"#8946e7\" stroke=\"black\" d=\"M240,-187C240,-187 143,-187 143,-187 137,-187 131,-181 131,-175 131,-175 131,-116 131,-116 131,-110 137,-104 143,-104 143,-104 240,-104 240,-104 246,-104 252,-110 252,-116 252,-116 252,-175 252,-175 252,-181 246,-187 240,-187\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"191.5\" y=\"-171.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">灰 &lt;= 2.144</text>\r\n",
       "<text text-anchor=\"middle\" x=\"191.5\" y=\"-156.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">gini = 0.117</text>\r\n",
       "<text text-anchor=\"middle\" x=\"191.5\" y=\"-141.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">samples = 32</text>\r\n",
       "<text text-anchor=\"middle\" x=\"191.5\" y=\"-126.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">value = [0, 2, 30]</text>\r\n",
       "<text text-anchor=\"middle\" x=\"191.5\" y=\"-111.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">class = 贝尔摩德</text>\r\n",
       "</g>\r\n",
       "<!-- 1&#45;&gt;2 -->\r\n",
       "<g id=\"edge2\" class=\"edge\"><title>1&#45;&gt;2</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M365.982,-233.37C334.035,-217.464 294.673,-197.867 261.425,-181.314\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"262.733,-178.055 252.221,-176.731 259.613,-184.322 262.733,-178.055\"/>\r\n",
       "</g>\r\n",
       "<!-- 5 -->\r\n",
       "<g id=\"node6\" class=\"node\"><title>5</title>\r\n",
       "<path fill=\"#55e993\" stroke=\"black\" d=\"M478.5,-187C478.5,-187 374.5,-187 374.5,-187 368.5,-187 362.5,-181 362.5,-175 362.5,-175 362.5,-116 362.5,-116 362.5,-110 368.5,-104 374.5,-104 374.5,-104 478.5,-104 478.5,-104 484.5,-104 490.5,-110 490.5,-116 490.5,-116 490.5,-175 490.5,-175 490.5,-181 484.5,-187 478.5,-187\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"426.5\" y=\"-171.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">颜色强度 &lt;= 4.863</text>\r\n",
       "<text text-anchor=\"middle\" x=\"426.5\" y=\"-156.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">gini = 0.219</text>\r\n",
       "<text text-anchor=\"middle\" x=\"426.5\" y=\"-141.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">samples = 8</text>\r\n",
       "<text text-anchor=\"middle\" x=\"426.5\" y=\"-126.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">value = [0, 7, 1]</text>\r\n",
       "<text text-anchor=\"middle\" x=\"426.5\" y=\"-111.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">class = 雪莉</text>\r\n",
       "</g>\r\n",
       "<!-- 1&#45;&gt;5 -->\r\n",
       "<g id=\"edge5\" class=\"edge\"><title>1&#45;&gt;5</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M426.5,-222.907C426.5,-214.649 426.5,-205.864 426.5,-197.302\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"430,-197.021 426.5,-187.021 423,-197.021 430,-197.021\"/>\r\n",
       "</g>\r\n",
       "<!-- 3 -->\r\n",
       "<g id=\"node4\" class=\"node\"><title>3</title>\r\n",
       "<path fill=\"#ffffff\" stroke=\"black\" d=\"M101,-68C101,-68 12,-68 12,-68 6,-68 0,-62 0,-56 0,-56 0,-12 0,-12 0,-6 6,-0 12,-0 12,-0 101,-0 101,-0 107,-0 113,-6 113,-12 113,-12 113,-56 113,-56 113,-62 107,-68 101,-68\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"56.5\" y=\"-52.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">gini = 0.5</text>\r\n",
       "<text text-anchor=\"middle\" x=\"56.5\" y=\"-37.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">samples = 2</text>\r\n",
       "<text text-anchor=\"middle\" x=\"56.5\" y=\"-22.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">value = [0, 1, 1]</text>\r\n",
       "<text text-anchor=\"middle\" x=\"56.5\" y=\"-7.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">class = 雪莉</text>\r\n",
       "</g>\r\n",
       "<!-- 2&#45;&gt;3 -->\r\n",
       "<g id=\"edge3\" class=\"edge\"><title>2&#45;&gt;3</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M141.231,-103.726C129.536,-94.2406 117.102,-84.1551 105.465,-74.7159\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"107.526,-71.8808 97.5543,-68.2996 103.116,-77.3173 107.526,-71.8808\"/>\r\n",
       "</g>\r\n",
       "<!-- 4 -->\r\n",
       "<g id=\"node5\" class=\"node\"><title>4</title>\r\n",
       "<path fill=\"#8540e6\" stroke=\"black\" d=\"M240,-68C240,-68 143,-68 143,-68 137,-68 131,-62 131,-56 131,-56 131,-12 131,-12 131,-6 137,-0 143,-0 143,-0 240,-0 240,-0 246,-0 252,-6 252,-12 252,-12 252,-56 252,-56 252,-62 246,-68 240,-68\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"191.5\" y=\"-52.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">gini = 0.064</text>\r\n",
       "<text text-anchor=\"middle\" x=\"191.5\" y=\"-37.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">samples = 30</text>\r\n",
       "<text text-anchor=\"middle\" x=\"191.5\" y=\"-22.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">value = [0, 1, 29]</text>\r\n",
       "<text text-anchor=\"middle\" x=\"191.5\" y=\"-7.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">class = 贝尔摩德</text>\r\n",
       "</g>\r\n",
       "<!-- 2&#45;&gt;4 -->\r\n",
       "<g id=\"edge4\" class=\"edge\"><title>2&#45;&gt;4</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M191.5,-103.726C191.5,-95.5175 191.5,-86.8595 191.5,-78.56\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"195,-78.2996 191.5,-68.2996 188,-78.2996 195,-78.2996\"/>\r\n",
       "</g>\r\n",
       "<!-- 6 -->\r\n",
       "<g id=\"node7\" class=\"node\"><title>6</title>\r\n",
       "<path fill=\"#39e581\" stroke=\"black\" d=\"M371,-68C371,-68 282,-68 282,-68 276,-68 270,-62 270,-56 270,-56 270,-12 270,-12 270,-6 276,-0 282,-0 282,-0 371,-0 371,-0 377,-0 383,-6 383,-12 383,-12 383,-56 383,-56 383,-62 377,-68 371,-68\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"326.5\" y=\"-52.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">gini = 0.0</text>\r\n",
       "<text text-anchor=\"middle\" x=\"326.5\" y=\"-37.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">samples = 7</text>\r\n",
       "<text text-anchor=\"middle\" x=\"326.5\" y=\"-22.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">value = [0, 7, 0]</text>\r\n",
       "<text text-anchor=\"middle\" x=\"326.5\" y=\"-7.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">class = 雪莉</text>\r\n",
       "</g>\r\n",
       "<!-- 5&#45;&gt;6 -->\r\n",
       "<g id=\"edge6\" class=\"edge\"><title>5&#45;&gt;6</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M389.264,-103.726C380.934,-94.6054 372.098,-84.93 363.767,-75.8078\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"366.239,-73.3235 356.911,-68.2996 361.07,-78.044 366.239,-73.3235\"/>\r\n",
       "</g>\r\n",
       "<!-- 7 -->\r\n",
       "<g id=\"node8\" class=\"node\"><title>7</title>\r\n",
       "<path fill=\"#8139e5\" stroke=\"black\" d=\"M505.5,-68C505.5,-68 413.5,-68 413.5,-68 407.5,-68 401.5,-62 401.5,-56 401.5,-56 401.5,-12 401.5,-12 401.5,-6 407.5,-0 413.5,-0 413.5,-0 505.5,-0 505.5,-0 511.5,-0 517.5,-6 517.5,-12 517.5,-12 517.5,-56 517.5,-56 517.5,-62 511.5,-68 505.5,-68\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"459.5\" y=\"-52.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">gini = 0.0</text>\r\n",
       "<text text-anchor=\"middle\" x=\"459.5\" y=\"-37.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">samples = 1</text>\r\n",
       "<text text-anchor=\"middle\" x=\"459.5\" y=\"-22.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">value = [0, 0, 1]</text>\r\n",
       "<text text-anchor=\"middle\" x=\"459.5\" y=\"-7.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">class = 贝尔摩德</text>\r\n",
       "</g>\r\n",
       "<!-- 5&#45;&gt;7 -->\r\n",
       "<g id=\"edge7\" class=\"edge\"><title>5&#45;&gt;7</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M438.788,-103.726C441.317,-95.3351 443.987,-86.4745 446.539,-78.0072\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"449.93,-78.8842 449.465,-68.2996 443.228,-76.8643 449.93,-78.8842\"/>\r\n",
       "</g>\r\n",
       "<!-- 9 -->\r\n",
       "<g id=\"node10\" class=\"node\"><title>9</title>\r\n",
       "<path fill=\"#62ea9b\" stroke=\"black\" d=\"M678,-187C678,-187 581,-187 581,-187 575,-187 569,-181 569,-175 569,-175 569,-116 569,-116 569,-110 575,-104 581,-104 581,-104 678,-104 678,-104 684,-104 690,-110 690,-116 690,-116 690,-175 690,-175 690,-181 684,-187 678,-187\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"629.5\" y=\"-171.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">酒精 &lt;= 13.051</text>\r\n",
       "<text text-anchor=\"middle\" x=\"629.5\" y=\"-156.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">gini = 0.292</text>\r\n",
       "<text text-anchor=\"middle\" x=\"629.5\" y=\"-141.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">samples = 52</text>\r\n",
       "<text text-anchor=\"middle\" x=\"629.5\" y=\"-126.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">value = [8, 43, 1]</text>\r\n",
       "<text text-anchor=\"middle\" x=\"629.5\" y=\"-111.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">class = 雪莉</text>\r\n",
       "</g>\r\n",
       "<!-- 8&#45;&gt;9 -->\r\n",
       "<g id=\"edge9\" class=\"edge\"><title>8&#45;&gt;9</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M629.5,-222.907C629.5,-214.649 629.5,-205.864 629.5,-197.302\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"633,-197.021 629.5,-187.021 626,-197.021 633,-197.021\"/>\r\n",
       "</g>\r\n",
       "<!-- 12 -->\r\n",
       "<g id=\"node13\" class=\"node\"><title>12</title>\r\n",
       "<path fill=\"#e6853f\" stroke=\"black\" d=\"M923,-187C923,-187 810,-187 810,-187 804,-187 798,-181 798,-175 798,-175 798,-116 798,-116 798,-110 804,-104 810,-104 810,-104 923,-104 923,-104 929,-104 935,-110 935,-116 935,-116 935,-175 935,-175 935,-181 929,-187 923,-187\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"866.5\" y=\"-171.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">灰的碱性 &lt;= 28.549</text>\r\n",
       "<text text-anchor=\"middle\" x=\"866.5\" y=\"-156.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">gini = 0.061</text>\r\n",
       "<text text-anchor=\"middle\" x=\"866.5\" y=\"-141.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">samples = 32</text>\r\n",
       "<text text-anchor=\"middle\" x=\"866.5\" y=\"-126.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">value = [31, 1, 0]</text>\r\n",
       "<text text-anchor=\"middle\" x=\"866.5\" y=\"-111.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">class = 琴酒</text>\r\n",
       "</g>\r\n",
       "<!-- 8&#45;&gt;12 -->\r\n",
       "<g id=\"edge12\" class=\"edge\"><title>8&#45;&gt;12</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M695.518,-230.909C724.439,-216.631 758.587,-199.774 788.731,-184.892\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"790.53,-187.908 797.947,-180.342 787.431,-181.631 790.53,-187.908\"/>\r\n",
       "</g>\r\n",
       "<!-- 10 -->\r\n",
       "<g id=\"node11\" class=\"node\"><title>10</title>\r\n",
       "<path fill=\"#3ee684\" stroke=\"black\" d=\"M645,-68C645,-68 548,-68 548,-68 542,-68 536,-62 536,-56 536,-56 536,-12 536,-12 536,-6 542,-0 548,-0 548,-0 645,-0 645,-0 651,-0 657,-6 657,-12 657,-12 657,-56 657,-56 657,-62 651,-68 645,-68\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"596.5\" y=\"-52.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">gini = 0.049</text>\r\n",
       "<text text-anchor=\"middle\" x=\"596.5\" y=\"-37.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">samples = 40</text>\r\n",
       "<text text-anchor=\"middle\" x=\"596.5\" y=\"-22.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">value = [1, 39, 0]</text>\r\n",
       "<text text-anchor=\"middle\" x=\"596.5\" y=\"-7.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">class = 雪莉</text>\r\n",
       "</g>\r\n",
       "<!-- 9&#45;&gt;10 -->\r\n",
       "<g id=\"edge10\" class=\"edge\"><title>9&#45;&gt;10</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M617.212,-103.726C614.683,-95.3351 612.013,-86.4745 609.461,-78.0072\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"612.772,-76.8643 606.535,-68.2996 606.07,-78.8842 612.772,-76.8643\"/>\r\n",
       "</g>\r\n",
       "<!-- 11 -->\r\n",
       "<g id=\"node12\" class=\"node\"><title>11</title>\r\n",
       "<path fill=\"#f5d0b5\" stroke=\"black\" d=\"M776,-68C776,-68 687,-68 687,-68 681,-68 675,-62 675,-56 675,-56 675,-12 675,-12 675,-6 681,-0 687,-0 687,-0 776,-0 776,-0 782,-0 788,-6 788,-12 788,-12 788,-56 788,-56 788,-62 782,-68 776,-68\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"731.5\" y=\"-52.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">gini = 0.542</text>\r\n",
       "<text text-anchor=\"middle\" x=\"731.5\" y=\"-37.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">samples = 12</text>\r\n",
       "<text text-anchor=\"middle\" x=\"731.5\" y=\"-22.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">value = [7, 4, 1]</text>\r\n",
       "<text text-anchor=\"middle\" x=\"731.5\" y=\"-7.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">class = 琴酒</text>\r\n",
       "</g>\r\n",
       "<!-- 9&#45;&gt;11 -->\r\n",
       "<g id=\"edge11\" class=\"edge\"><title>9&#45;&gt;11</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M667.481,-103.726C675.977,-94.6054 684.99,-84.93 693.487,-75.8078\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"696.226,-78.0025 700.481,-68.2996 691.104,-73.2312 696.226,-78.0025\"/>\r\n",
       "</g>\r\n",
       "<!-- 13 -->\r\n",
       "<g id=\"node14\" class=\"node\"><title>13</title>\r\n",
       "<path fill=\"#e58139\" stroke=\"black\" d=\"M915,-68C915,-68 818,-68 818,-68 812,-68 806,-62 806,-56 806,-56 806,-12 806,-12 806,-6 812,-0 818,-0 818,-0 915,-0 915,-0 921,-0 927,-6 927,-12 927,-12 927,-56 927,-56 927,-62 921,-68 915,-68\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"866.5\" y=\"-52.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">gini = 0.0</text>\r\n",
       "<text text-anchor=\"middle\" x=\"866.5\" y=\"-37.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">samples = 31</text>\r\n",
       "<text text-anchor=\"middle\" x=\"866.5\" y=\"-22.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">value = [31, 0, 0]</text>\r\n",
       "<text text-anchor=\"middle\" x=\"866.5\" y=\"-7.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">class = 琴酒</text>\r\n",
       "</g>\r\n",
       "<!-- 12&#45;&gt;13 -->\r\n",
       "<g id=\"edge13\" class=\"edge\"><title>12&#45;&gt;13</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M866.5,-103.726C866.5,-95.5175 866.5,-86.8595 866.5,-78.56\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"870,-78.2996 866.5,-68.2996 863,-78.2996 870,-78.2996\"/>\r\n",
       "</g>\r\n",
       "<!-- 14 -->\r\n",
       "<g id=\"node15\" class=\"node\"><title>14</title>\r\n",
       "<path fill=\"#39e581\" stroke=\"black\" d=\"M1046,-68C1046,-68 957,-68 957,-68 951,-68 945,-62 945,-56 945,-56 945,-12 945,-12 945,-6 951,-0 957,-0 957,-0 1046,-0 1046,-0 1052,-0 1058,-6 1058,-12 1058,-12 1058,-56 1058,-56 1058,-62 1052,-68 1046,-68\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"1001.5\" y=\"-52.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">gini = 0.0</text>\r\n",
       "<text text-anchor=\"middle\" x=\"1001.5\" y=\"-37.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">samples = 1</text>\r\n",
       "<text text-anchor=\"middle\" x=\"1001.5\" y=\"-22.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">value = [0, 1, 0]</text>\r\n",
       "<text text-anchor=\"middle\" x=\"1001.5\" y=\"-7.8\" font-family=\"Helvetica,sans-Serif\" font-size=\"14.00\">class = 雪莉</text>\r\n",
       "</g>\r\n",
       "<!-- 12&#45;&gt;14 -->\r\n",
       "<g id=\"edge14\" class=\"edge\"><title>12&#45;&gt;14</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M916.769,-103.726C928.464,-94.2406 940.898,-84.1551 952.535,-74.7159\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"954.884,-77.3173 960.446,-68.2996 950.474,-71.8808 954.884,-77.3173\"/>\r\n",
       "</g>\r\n",
       "</g>\r\n",
       "</svg>\r\n"
      ],
      "text/plain": [
       "<graphviz.files.Source at 0x19039e1aca0>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "feature_name = ['酒精','苹果酸','灰','灰的碱性','镁','总酚','类黄酮','非黄烷类酚类','花青素','颜色强度','色调','od280/od315稀释葡萄酒','脯氨酸']\n",
    "\n",
    "dot_data = tree.export_graphviz(clf   # 输入已经训练好的模型\n",
    "                                ,feature_names=feature_name              # 特征的名字\n",
    "                                ,class_names=['琴酒','雪莉','贝尔摩德']    # 标签名\n",
    "                                ,filled=True                             # 显示颜色\n",
    "                                ,rounded=True                            # 圆角框\n",
    "                                )\n",
    "\n",
    "graph = graphviz.Source(dot_data) # 导出树\n",
    "graph"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 判断树对训练集的拟合程度如何"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9354838709677419"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "score_train = clf.score(X_train,y_train)\n",
    "score_train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9629629629629629"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "score_test =clf.score(X_test,y_test)\n",
    "score_test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 参数\n",
    "\n",
    "1. criterion，字符型，可不填，默认基尼系数（gini）\n",
    "                * 用来衡量分枝质量的指标，即衡量不纯度的指标 \n",
    "                * 输入“gini”，使用“基尼系数”；\n",
    "                * 输入“entropy”使用信息熵的“信息增益”，即父节点信息熵和子节点的信息熵之差；\n",
    "                * 选取：\n",
    "                    - 通常使用基尼系数\n",
    "                    - 数据维度大、噪音大时使用基尼系数\n",
    "                    - 纬度低、数据比较清晰时，信息熵和基尼系数没区别\n",
    "                    - 当决策树的拟合程度不够的时候，使用信息熵\n",
    "                    - 两个都试试，不好就换另一个\n",
    "2. splitter，字符型，可不填，默认最佳分枝（best）\n",
    "                * 确定么个节点的分枝策略\n",
    "                * 输入“best”使用最佳分枝\n",
    "                * 输入“random”使用最佳随机分枝\n",
    "3. max_depth，整数或None，可不填，默认None\n",
    "                * 树的最大深度。如果是None，数会持续生长直到所有叶子节点的不纯度为0，\n",
    "                * 或者直到每个叶子节点所含的样本量都小于参数min_samples_split中输入的数字。\n",
    "4. min_samples_split，整数或浮点数，可不填，默认=2\n",
    "                * 一个中间节点要分枝所需要的最小样本量。\n",
    "                * 如果一个节点包含的最小样本量小于min_samples_split中填写的数字，这个节点的分枝就不会发生\n",
    "                * 也就是说，这个节点一定会成为一个叶子节点\n",
    "                    1. 如果输入整数，则认为输入的数字是分枝所需的最小样本量\n",
    "                    2. 如果输入浮点数，则认为输入的浮点数是比例，\n",
    "                        输入的浮点数*输入模型的数据集* 的样本量（n_samples）是分枝所需的最小样本量\n",
    "5. min_samples_leaf，整数或浮点数，可不填，默认=1\n",
    "                * 一个叶子节点要存在所需要的最小样本量。\n",
    "                * 一个节点在分枝后的每个子节点中，必须要包含至少min_samples_leaf各训练样本\n",
    "                * 否则分枝就不会发生。\n",
    "                * 这个参数的效果，尤其是在回归中\n",
    "                    - 如果输入整数，则认为输入的数字是叶节点存在所需的最小样本量\n",
    "                    - 如果输入浮点数，则认为输入的浮点数是比例，\n",
    "                        输入的浮点数的样本量（n_samples）是叶节点所需的最小样本量\n",
    "                    \n",
    "6. min_weight_fraction_leaf，浮点数，可不填，默认=0\n",
    "                * 一个叶节点要存在所需要的权重站输入模型的数据集的总权重的比例\n",
    "                * 总权重由fit接口中的sample_weight参数确定，\n",
    "                * 当sample_weight为None时，默认所有样本的权重相同。\n",
    "7. max_features，整数，浮点数，字符型或None，可不填，默认None\n",
    "                * 在做最佳分枝的时候，考虑到特征个数\n",
    "                    - 输入整数，则每一次分枝都考虑max_features个特征\n",
    "                    - 输入浮点数，则认为输入的浮点数时比例，每次分枝考虑的特征树木是max_features输入模型的数据集的特征个数（n_features）\n",
    "                    - 输入‘auto’，采用n_features的平方根作为分枝时考虑的特征数目；\n",
    "                    - 输入‘sqrt’，采用n_features的平方根作为分枝时考虑的特征数目；\n",
    "                    - 输入‘log2’，采用log2(n_features)作为分枝时考虑的特征数目；\n",
    "                    - 输入‘None’，n_features就是分枝时考虑的特征数目；\n",
    "8. random_state，整数，sklearn中设定好的RandomState实例，或None，可不填，默认None\n",
    "                * 输入整数，random_state是由随机数生成器生成的随机数\n",
    "                * 输入RandomState实例，则random_state是一个随机数生成器\n",
    "                * 输入None，随机数生成器会是np.random模块中的RandomState实例\n",
    "9. max_leaf_nodes，整数或None，可不填，默认NOne\n",
    "                * 最大叶节点数量。\n",
    "                * 在最佳分枝方式下，以max_leaf_nodes为限制来生长树。\n",
    "                * 如果是None则没有节点数量的限制\n",
    "10. min_impurity_decrease，浮点数，可以不填，默认=0\n",
    "                * 当一个节点的分枝后引起的不纯度的降低大于或等于min_inmpurity_decre的值，则这个分枝会被保留，不会被剪枝\n",
    "11. min_impurity_split，浮点数\n",
    "                * 防止书生长的阈值之一。\n",
    "                * 如果一个节点的不纯度高于min_impurity_split，这个节点就会被分枝，否则的话这个节点只能是叶子节点。\n",
    "                * 0.19以上版本中，则个参数被min_impurity_decrease取代，\n",
    "                * 在0.21版本中，这个参数将会被删除，请使用min_impurity_decrease\n",
    "12. class_weight，字典，字典的列表，“balanced”或者None，默认None\n",
    "                * 与标签相关联的权重\n",
    "                * 表示方式是{标签的值：权重}\n",
    "                * 如果为None则默认所有标签持有相同的权重\n",
    "                * 注意：如果制定了sample_weight，这些权重将通过fit接口与sample_weight相乘\n",
    "13. presort，布尔值，可不填，默认False\n",
    "                * 是否预先分配数据以加快拟合中最佳分枝的发现\n",
    "                * 在大型数据集上使用默认设置的决策树时，将这个参数设置为true可能会延迟训练过程\n",
    "                * ，降低训练速度。\n",
    "                * 当使用较小的数据集或限制树的深度时，设置这个参数为true可能会加快训练速度。\n",
    "14. ccp_alpha，"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 重要属性和接口\n",
    "\n",
    "* 属性：\n",
    "    - 是在模型训练之后，能够调用查看的模型的**各种性质**。\n",
    "    - 对决策树来说，最重要的是`feature_importances_`，能够查看各个特征对模型的重要性\n",
    "        - 最重要的是`feature_importances_`\n",
    "        - 最重要的是`feature_importances_`\n",
    "        - 最重要的是`feature_importances_`\n",
    "    - \n",
    "        ```python\n",
    "          clf.classes_  \n",
    "        ```\n",
    "            * 输出一个数组（array)或者一个数组的列表（list），结构为标签的\n",
    "    - \n",
    "        ```python\n",
    "        clf.max_features_  \n",
    "        ```\n",
    "            * 输出整数\n",
    "            * 参数max_features的推断值\n",
    "    - \n",
    "        ```python\n",
    "        clf.n_classes_\n",
    "        ```\n",
    "            * 输出整数或列表\n",
    "            * 标签类别的数据\n",
    "    - \n",
    "        ```python\n",
    "        clf.n_features_\n",
    "        ```\n",
    "            * 在训练模型（fit）时使用的特征个数    \n",
    "    - \n",
    "        ```python\n",
    "        clf.n_outputs_\n",
    "        ```\n",
    "            * 在训练模型（fit）时输出的结果的个数\n",
    "    - \n",
    "        ```python\n",
    "        clf.tree_\n",
    "        ```\n",
    "            * 输出一个可以导出剑豪的树结构的端口，通过这个端口，可以访问树的结构和低级属性，包括但不限于查看：\n",
    "                1. 二叉树的结构\n",
    "                2. 每个节点的深度以及它是否是叶子\n",
    "                3. 使用decision_path方法的示例到达的节点\n",
    "                4. 用apply这个接口取样出的叶子\n",
    "                5. 用于预测样本的规则\n",
    "                6. 一组样本共享的决策路径\n",
    "            \n",
    "* 四个接口\n",
    "    \n",
    "    - 所有接口中要输入X_train和X_test的部分，输入的特征矩阵**至少是二维矩阵**。\n",
    "        - **sklearn不接受任何一维矩阵作为特征矩阵被输入**\n",
    "        - 如果数据的确只有一个特征，必须要reshape(-1,1)来给矩阵**增维**\n",
    "        - 如果数据只有一个特征和一个样本，使用reshape(1,-1)给数据**增维**\n",
    "    - `apply`,`predict`，只输入测试集的**特征**\n",
    "    \n",
    "        -     \n",
    "         ```python\n",
    "         clf.apply(X_test)\n",
    "         ```\n",
    "             * apply返回每个测试样本所在叶子节点的**索引**\n",
    "\n",
    "        - \n",
    "        ```python\n",
    "        clf.predict(X_test)\n",
    "        ```\n",
    "            * predict返回每个测试集样本的分类/回归**结果**\n",
    "        \n",
    "    - `fit`,`score`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([10, 10,  4,  4,  4, 10, 13, 10,  4, 11, 13, 13,  4, 10,  4,  4, 13,\n",
       "       10, 10, 10, 13, 10, 10,  4, 13, 13,  4,  4, 13, 13,  4, 10, 13, 13,\n",
       "       10, 10, 10, 13, 10, 13,  6,  4, 10,  4, 13,  4,  4, 11, 10, 11, 13,\n",
       "       10,  4, 10], dtype=int64)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "clf.apply(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 1, 2, 2, 2, 1, 0, 1, 2, 0, 0, 0, 2, 1, 2, 2, 0, 1, 1, 1, 0, 1,\n",
       "       1, 2, 0, 0, 2, 2, 0, 0, 2, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 2, 1, 2,\n",
       "       0, 2, 2, 0, 1, 0, 0, 1, 2, 1])"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "clf.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
